Chapter-1: Review of Socio-Demographic Input Data

1. Introduction

Transit ridership and highway volumes for the Corridor studies are obtained by running various alternative scenarios in STOPS and SERPM models. Both of these models are developed differently and uses different input data sets. However, some of the data between these models come from MPO and transit agencies. The transit data between STOPS and SERPM is in two different formats. STOPS transit inputs are in GTFS format whereas SERPM inputs are in CUBE PT format. Although these two are not comparable formats, they both should represent transit networks. The other input file is socio-demographic or landuse file. The landuse file exists in both model but again in different format and at different geographic levels. Since the transit ridership estimates for the Corridor studies are expected to reply upon STOPS model, transit networks between STOPS and SERPM are not compared here. However, the landuse data is studied in great detail.

The 2015 socio-economic data is developed through linear interpolation. Currently, the 2015 SE data exists in two different models:

  1. SEFL STOPS model (MPO0000TAZPopEmp.dbf)
  2. SERPM MAZ data (maz_data.csv)

The STOPS SE data is at TAZ level where as SERPM model data is at MAZ level. In order to perform a comparative analysis between these two input data, SERPM MAZ level data is aggregated to TAZ. This document shows all findings at TAZ level. Technically SE data in the two models should be same and should originate from the same source. Since both models (STOPS and SERPM) were constantly being updated the model data for 2040 could be different. This document summarizes those differences.

The 2040 SE data exists in the following model locations:

  1. SERPM v7.602 from FDOT website
  2. Corradino delivered model
  3. SEFL STOPS
#path <- getwd()
path <-"/Volumes/C/projects/SERPM_Compare/check_seData"

# data directories
dir <- "Corradino_SEData"
fdot.dir <- "FDOT_June_30_2016"
stops.dir <- "STOPS_SEData"
  
# file names 
# TODO (ans): Replace maz_data.csv files with model_data.csv (which is more comprehensive data)
maz.data.files <- c("2010_maz_data.csv", "2015_maz_data.csv", "2040_maz_data.csv")
fdot.maz.files <- c("maz_data_IN_2040R.csv", "maz_data_IN_2040T.csv")
stops_mpo_shapeFile <- "simplified_MPOTAZPopEmp.shp"
taz_county_file <- "taz_county.csv"

# list of TAZs to check
check_taz <- c(76, 387, 979, 1596, 1598, 2253)

# Save R Objects for later use
save.RData.outputs <- TRUE

2. SERPM Landuse Data

The SERPM maz data files are developed by FDOT with feedback from various agencies, including the three MPOs in the region. The future year maz data file is constantly updated to reflect revised population and employment projections. Due to this continuous update there are several versions of 2040 data with significant difference across population, households and employment variables. As a part of Corridor studies effort, it is required to document the source of model data being used as well as validate data.

The model data files delivered by Corradino were reviewed for data consistency across the three horizon years: 2010, 2015 and 2040. Some of the data fields are not consistent across all years. Two fields geoSRate and geoSRateNm exist in some maz_data.csv files but not in all.

2.1 Check 2015 Trend

The 2015 SE data is developed through linear interpolation and thus the growth rate should always be linear and between 2010 and 2040. This section of the code checks if there are any households in 2015 that drop from 2010 but gain back in 2040 (checks growth rate for linearity). The following table shows households across 2010, 2015 and 2040 (data with 5 hhs difference is ignored).

** The household variable in this maz_data.csv file computed by aggregating PopSyn-3 outputs and thus there is some degree of over/under estimation of households at MAZ level when compared to PopSyn-3 inputs ** The differences shown in the data below are within reasonable range.

check_hh <- data_all_years %>%
   mutate_each(funs(replace(.,is.na(.),0))) %>%
   mutate(diff_1015 = hh_2015 - hh_2010,
          diff_1540 = hh_2040 - hh_2015,
          check = ifelse((diff_1015 > 0 && diff_1540 < 0) || (diff_1015 < 0 && diff_1540 > 0), 1, 0)) %>%
   filter(check == 1 , abs(diff_1540) > 5, abs(diff_1015) > 5)

check_hh <- check_hh %>%
    select(TAZ, hh_2010,    hh_2015,    hh_2040, diff_1015, diff_1540)

# kable(check_hh, caption = "Zones with Inconsistent Households Trends", digits = 0, format.args = list(big.mark = ","))

datatable(check_hh, caption = "Zones with Inconsistent Households Trends")
# Save R Object file
 if (save.RData.outputs) {
   save(check_hh, file = "table_check_hh.RData")
 }

2.2 Selected TAZs

The following table shows data for selected TAZ: 76, 387, 979, 1596, 1598, 2253. These zones were selected based on the past review of 2015 zonal data. The current model data shows consistent growth rate across household, population and employment variables between 2010, 2015 and 2040 years.

sel_data <- data_all_years %>% 
   filter(TAZ %in% check_taz) %>%
   select(TAZ, pop_2010, pop_2015, pop_2040, 
               emp_total_2010, emp_total_2015, emp_total_2040, 
               hh_2010, hh_2015,    hh_2040)

kable(sel_data, caption = "Selected TAZ from SERPM Data", digits = 0, format.args = list(big.mark = ","))
Selected TAZ from SERPM Data
TAZ pop_2010 pop_2015 pop_2040 emp_total_2010 emp_total_2015 emp_total_2040 hh_2010 hh_2015 hh_2040
76 953 994 1,207 119 123 142 569 597 745
387 214 217 217 745 730 655 98 98 99
979 1,689 1,684 1,690 243 244 251 788 787 780
1,596 499 505 499 402 417 493 241 241 244
1,598 307 382 656 27 28 31 165 173 241
2,253 34 80 277 265 267 275 15 33 122

2.3 Growth Rate Analysis

The figures below show growth rates by county for selected input variables: households (hh), population (pop), total employment (emp_total), college enrollement (college), school enrollement (school).

  1. Miami-Dade County: The growth rate looks ok here. The hh, pop and emp growth rates are at 25 percent, college enrollment grwoth is at 20 percent and school is at 5 percent.

  2. Broward County: The hh growth rate looks ok, pop seems a bit low at 12 percent but employment is projected to grow by 5 percent? Need to double check with FDOT. Same issue with College and School too.

  3. Palm-Beach County: The growth rate looks ok here. The hh, pop and emp growth rates are at 30 percent, college and school at 30 and 15 percent respectively.


3. FDOT Official 2040 Data

The two official 2040 maz_data.csv files are downloaded from FDOT website June 2016 :

  1. IN-2040T_data.csv : 2040 LRTP scenario data
  2. IN-2040R_data.csv : 2040 Cost Feasible scenario data

These two data sets are compared with the Corradino delivered 2040 data to make sure the model is using the official version and to document the source of model data being used for the corridor studies.

3.1 Compare 2040 CF with Corradino data

Several tabulations were computed to ensure that the model data being used for the corridor studies is from the official FDOT Cost Feasible scenario. Table below shows a comparison between the 2040 Cost Feasible and Corradino 2040 model data, where there are zero differences. Chart below shows a scatter plot of population variable from the two data sets where there it clearly depicts both data sets being same.

mgra TAZ.Corradino hh.Corradino pop.Corradino emp_total.Corradino TAZ.FDOT hh.FDOT pop.FDOT emp_total.FDOT diff.hh diff.pop diff.emp pop.bin —– ————– ————- ————– ——————– ——— ——– ——— ————— ——– ——— ——— ——–

3.2 2040 LRTP vs 2040 Cost Feasible

Well, the 2040 LRTP data is significantly different from 2040 Cost Feasible and thus the Corradino data also differed. There are too many zones to display the difference in tabular form. The plot below shows a scatter plot of population data from the two data sets.

mgra TAZ.Corradino hh.Corradino pop.Corradino emp_total.Corradino TAZ.FDOT hh.FDOT pop.FDOT emp_total.FDOT diff.hh diff.pop diff.emp pop.bin
1 2901 43 169 0 2901 43 172 0 0 -3 0 -1000 to -1
2 2902 9 23 1337 2902 9 23 1337 0 0 0 0
3 2903 497 1694 379 2903 497 1685 379 0 9 0 1 to 1000
4 2903 273 984 21 2903 273 1001 21 0 -17 0 -1000 to -1
5 2903 383 1306 86 2903 383 1308 86 0 -2 0 -1000 to -1
6 2903 212 861 14 2903 212 841 14 0 20 0 1 to 1000

4. STOPS MPO Data

The latest South East Florida Regional STOPS model is downloaded from FDOT page and was reviewed. As a part of the review, model landuse data and observed APC counts were checked. Since the Corridor studies use both SERPM and STOPS models, it is important to check and ensure the input data is consistent between the models. The downloaded SEFL STOPS model consists of 2010, 2015, and 2040 population and employment data at TAZ level. As per SEFL STOPS model documentation, the 2014 data computed by interpolating between 2010 and 2040. STOPS model utilizes only population and employment variables and household variable is not used and thus not provided in the data set.

This data is clearly different from SERPM 2015 MAZ data.

# Read data from stops input
shape <- readOGR(paste0(path,"/",stops.dir,"/",stops_mpo_shapeFile), layer = "simplified_MPOTAZPopEmp", verbose = FALSE)
stops_se <- shape@data

stops_sel_data <- stops_se %>%
  filter(TAZ_REG %in% check_taz) %>%
  select(TAZ_REG, POP_10,   POP_15, POP_40, 
                  TOTE_10, TOTE_15, TOTE_40)

kable(stops_sel_data, caption = "Selected TAZ from STOPS Data", digits = 0, format.args = list(big.mark = ","))
Selected TAZ from STOPS Data
TAZ_REG POP_10 POP_15 POP_40 TOTE_10 TOTE_15 TOTE_40
76 953 994 1,202 119 123 142
387 214 214 216 745 730 655
979 1,689 1,689 1,691 243 244 251
1,596 499 498 495 402 417 493
1,598 307 364 649 27 28 31
2,253 34 75 279 265 267 275

4.1 2015 population and employment difference

Table below shows the number of TAZ by range of pop and emp difference. If there is no difference then it is not tabulated here.

bin diff.pop_15 diff_emp_15
-100 to -50 14 NA
-20 to 0 1445 177
-50 to -20 50 NA
-500 to -100 21 NA
-5000 to -500 5 NA
0 to 20 1496 885
100 to 500 27 2
20 to 50 88 1
50 to 100 44 NA
500 to 5000 2 1

The plot below shows population difference between the two data sets.

The plot below shows employment difference between the two data sets.

4.2 Population Difference

The map below shows population difference for 2010, 2015 and 2040 by TAZ between the two data sets. The population for 2010 between the two models (SERPM and STOPS) is the same data where as the 2040 data is different across most of the TAZs. I guess the 2040 difference trickled down into 2015 when interpolated.

4.3 Employment Difference

The employment data for 2010 between the two models (SERPM and STOPS) is the same data where as the 2015 data is different across most of the TAZs. About 5 TAZs show varying employment data for year 2040. Table below shows those 5 TAZs.

 tabulate_emp_diff_2040 <- df %>% 
      filter(diff_emp_40 != 0) %>%
      select(TAZ, emp_total_2040, TOTE_40, diff_emp_40)
 
  kable(tabulate_emp_diff_2040)
TAZ emp_total_2040 TOTE_40 diff_emp_40
825 8160 250 7910
854 89 0 89
864 3070 386 2684
1058 3000 0 3000
2409 1831 1678 153

The map below shows population difference for 2010, 2015 and 2040 by TAZ between the two data sets.